ERGMs

Jono Tuke

March 20, 2023

Packages

remotes::install_github("DougLuke/UserNetR")
pacman::p_load(tidyverse, ggraph, statnet, igraph, UserNetR, ergm, tidygraph)
theme_set(theme_bw())

The data

Moreno sociogram

  • 33 Nodes: students in 4th grade class in 1933 America.
  • 46 Edges: friendship.
  • Metadata: for each node we have gender.

Summary

data("Moreno")
Moreno <- as_tbl_graph(Moreno)
Moreno
# A tbl_graph: 33 nodes and 46 edges
#
# An undirected simple graph with 2 components
#
# Node Data: 33 × 2 (active)
  gender na   
   <dbl> <lgl>
1      2 FALSE
2      2 FALSE
3      2 FALSE
4      2 FALSE
5      2 FALSE
6      2 FALSE
# … with 27 more rows
#
# Edge Data: 46 × 3
   from    to na   
  <int> <int> <lgl>
1     1     2 FALSE
2     1     5 FALSE
3     1    18 FALSE
# … with 43 more rows

Cleaning - nodes

Moreno <- 
  Moreno |> 
  activate(nodes) |> 
  mutate(
    gender = ifelse(
      gender == 1, "Female", "Male"
    )
  ) |> 
  select(
    -na
  )
Moreno
# A tbl_graph: 33 nodes and 46 edges
#
# An undirected simple graph with 2 components
#
# Node Data: 33 × 1 (active)
  gender
  <chr> 
1 Male  
2 Male  
3 Male  
4 Male  
5 Male  
6 Male  
# … with 27 more rows
#
# Edge Data: 46 × 3
   from    to na   
  <int> <int> <lgl>
1     1     2 FALSE
2     1     5 FALSE
3     1    18 FALSE
# … with 43 more rows

Cleaning - edges

Moreno <- 
  Moreno |> 
  activate(edges) |> 
  select(-na)
Moreno
# A tbl_graph: 33 nodes and 46 edges
#
# An undirected simple graph with 2 components
#
# Edge Data: 46 × 2 (active)
   from    to
  <int> <int>
1     1     2
2     1     5
3     1    18
4     1     6
5     1    16
6     2     3
# … with 40 more rows
#
# Node Data: 33 × 1
  gender
  <chr> 
1 Male  
2 Male  
3 Male  
# … with 30 more rows

Visualization

ggraph(Moreno) + 
  geom_edge_link() + 
  geom_node_point(aes(fill = gender), size = 10, shape = 21) + 
  theme_graph() + 
  scale_fill_brewer(palette = "Set1") + 
  theme(legend.position = "bottom") + 
  labs(fill = "Gender")

Visualization

Five number summary of networks

Size

The number of nodes, vertices, or actors:

igraph::vcount(Moreno)
[1] 33

Density

Proportion of observed ties to maximum number of ties:

igraph::edge_density(Moreno)
[1] 0.08712121

Components

A subgraph in which all actor are connected directly or indirectly:

igraph::components(Moreno)
$membership
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2

$csize
[1] 31  2

$no
[1] 2

Diameter

The diameter then for an entire network is the longest of the shortest paths across all pairs of nodes.

igraph::diameter(Moreno)
[1] 11

Clustering coef

Transitivity is defined as the proportion of closed triangles (triads where all three ties are observed) to the total number of open and closed triangles (triads where either two or all three ties are observed).

igraph::transitivity(Moreno)
[1] 0.2857143

Clustering

play_erdos_renyi(n = 33, m = 46) |> 
  igraph::transitivity()
[1] 0

Modelling networks

Model 1 Logisitic regression

  • Distribution:

\[ Y_{ij} \sim Bern(p_{ij}) \]

  • Linear predictor:

\[ \eta_{ij} = \boldsymbol{x}^T_{ij}\boldsymbol{\beta} \]

  • Link function:

\[ p_{ij} = \frac{e^{\eta_{ij}}}{1 + e^{\eta_{ij}}} \]

Network structures

  1. Non-uniform degree distribution
  2. Homophily
  3. Transitivity
  4. Reciprocity

Exponential random graph models

\[ P( \boldsymbol{Y} = \boldsymbol{y}) = \frac{1}{\kappa} \exp\left( \sum_A\eta_Ag_A( \boldsymbol{y}) \right), \] where \(\eta_A\) is the parameter associated with configurations \(A\), and \[ g_A( \boldsymbol{y}) = \prod_{y_{ij}\in A}y_{ij} \]

Dependency graphs

  • Nodes are the tie variables.
  • Edges indicate dependency.

Bernoulli random graphs

Markov random graphs

Hammersley–Clifford theorem

\[ P( \boldsymbol{Y} = \boldsymbol{y}) = \frac{1}{\kappa} \exp \sum_{A \subseteq J^*}\theta_Az_{A}(\boldsymbol{y}) \] where \(J^*\) is the set of all cliques in the dependency graph, \(\theta_A\) is the parameter associated with the clique (configuration) \(A\), and \(z_A(\boldsymbol{y})\) is an indicator variable with value 1 if configuration \(A\) is in the graph.

Application to Moreno dataset

Null model

Moreno_network <- intergraph::asNetwork(Moreno)
null <- ergm(Moreno_network ~ edges)
summary(null)
Call:
ergm(formula = Moreno_network ~ edges)

Maximum Likelihood Results:

      Estimate Std. Error MCMC % z value Pr(>|z|)    
edges  -2.3493     0.1543      0  -15.22   <1e-04 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

     Null Deviance: 732.0  on 528  degrees of freedom
 Residual Deviance: 312.4  on 527  degrees of freedom
 
AIC: 314.4  BIC: 318.7  (Smaller is better. MC Std. Err. = 0)

Node degree

Moreno <- 
  Moreno |> 
  activate(nodes) |> 
  mutate(
    degree = centrality_degree()
  )
Moreno
# A tbl_graph: 33 nodes and 46 edges
#
# An undirected simple graph with 2 components
#
# Node Data: 33 × 2 (active)
  gender degree
  <chr>   <dbl>
1 Male        5
2 Male        2
3 Male        4
4 Male        3
5 Male        3
6 Male        2
# … with 27 more rows
#
# Edge Data: 46 × 2
   from    to
  <int> <int>
1     1     2
2     1     5
3     1    18
# … with 43 more rows

Node degree

Node factor

gender <- ergm(Moreno_network ~ edges + nodefactor('gender'))
summary(gender)
Call:
ergm(formula = Moreno_network ~ edges + nodefactor("gender"))

Maximum Likelihood Results:

                       Estimate Std. Error MCMC % z value Pr(>|z|)    
edges                  -2.43210    0.28194      0  -8.626   <1e-04 ***
nodefactor.gender.Male  0.07914    0.22220      0   0.356    0.722    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

     Null Deviance: 732.0  on 528  degrees of freedom
 Residual Deviance: 312.3  on 526  degrees of freedom
 
AIC: 316.3  BIC: 324.8  (Smaller is better. MC Std. Err. = 0)

Homophily

Moreno |> 
  activate(edges) |> 
  mutate(agreement = .N()$gender[from] == .N()$gender[to]) |> 
  data.frame() |> 
  count(agreement)
  agreement  n
1     FALSE  1
2      TRUE 45

Homophily

homophily <- ergm(Moreno_network ~ edges + nodematch('gender'))
summary(homophily)
Call:
ergm(formula = Moreno_network ~ edges + nodematch("gender"))

Maximum Likelihood Results:

                 Estimate Std. Error MCMC % z value Pr(>|z|)    
edges              -5.602      1.001      0  -5.596   <1e-04 ***
nodematch.gender    4.057      1.015      0   3.999   <1e-04 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

     Null Deviance: 732.0  on 528  degrees of freedom
 Residual Deviance: 251.3  on 526  degrees of freedom
 
AIC: 255.3  BIC: 263.8  (Smaller is better. MC Std. Err. = 0)

Differential homophily

diff_homo <- ergm(Moreno_network ~ edges + 
                    nodematch('gender',diff = TRUE)
)
summary(diff_homo)

Differential homophily

Call:
ergm(formula = Moreno_network ~ edges + nodematch("gender", diff = TRUE))

Maximum Likelihood Results:

                        Estimate Std. Error MCMC % z value Pr(>|z|)    
edges                     -5.602      1.001      0  -5.596   <1e-04 ***
nodematch.gender.Female    4.052      1.030      0   3.935   <1e-04 ***
nodematch.gender.Male      4.062      1.026      0   3.958   <1e-04 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

     Null Deviance: 732.0  on 528  degrees of freedom
 Residual Deviance: 251.3  on 525  degrees of freedom
 
AIC: 257.3  BIC: 270.1  (Smaller is better. MC Std. Err. = 0)

Transitivity

hom.gof <- gof(homophily,GOF = ~ degree + triadcensus - model)
plot(hom.gof)

Transitivity

Transitivity

hom.gof

Goodness-of-fit for degree 

        obs min mean max MC p-value
degree0   0   0 1.56   5       0.32
degree1   2   1 5.07  10       0.20
degree2  16   3 8.52  15       0.00
degree3   6   3 7.69  13       0.58
degree4   6   2 5.66  11       1.00
degree5   2   0 2.84   9       0.94
degree6   1   0 1.08   5       1.00
degree7   0   0 0.44   3       1.00
degree8   0   0 0.10   1       1.00
degree9   0   0 0.04   1       1.00

Goodness-of-fit for triad census 

               obs  min    mean  max MC p-value
triadcensus.0 4125 3758 4135.00 4429       0.88
triadcensus.1 1246  971 1211.69 1522       0.68
triadcensus.2   75   54  102.15  163       0.26
triadcensus.3   10    0    7.16   18       0.56

Compare models

AIC(null, gender, homophily, diff_homo)
          df      AIC
null       1 314.3925
gender     2 316.2655
homophily  2 255.2550
diff_homo  3 257.2541

Simulate models

set.seed(2023)
sim  <- simulate(homophily)

Simulated models

original data

simulated data